AITopics | fuzzy forest

Collaborating Authors

fuzzy forest

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fuzzy Forests For Feature Selection in High-Dimensional Survey Data: An Application to the 2020 U.S. Presidential Election

Dey, Sreemanti, Alvarez, R. Michael

arXiv.org Machine LearningMar-5-2022

An increasingly common methodological issue in the field of social science is high-dimensional and highly correlated datasets that are unamenable to the traditional deductive framework of study. Analysis of candidate choice in the 2020 Presidential Election is one area in which this issue presents itself: in order to test the many theories explaining the outcome of the election, it is necessary to use data such as the 2020 Cooperative Election Study Common Content, with hundreds of highly correlated features. We present the Fuzzy Forests algorithm, a variant of the popular Random Forests ensemble method, as an efficient way to reduce the feature space in such cases with minimal bias, while also maintaining predictive performance on par with common algorithms like Random Forests and logit. Using Fuzzy Forests, we isolate the top correlates of candidate choice and find that partisan polarization was the strongest factor driving the 2020 presidential election. Social science research today often encounters a difficult methodological situation -- larger and larger datasets, which contain high-dimensional features, which are highly correlated [7]. Quite literally, as in the application we discuss in our paper (the 2020 U.S Presidential election), to test the many different theories and potential explanations for why voters decided to remove then President Trump from office, researchers need to use methodologies that can quickly and efficiently reduce the feature space from hundreds of possible features to a smaller set that can then be the focus of further study. In our paper we present a variant of the popular Random Forest, Fuzzy Forests, which we argue is well suited for exactly this type of applied machine learning problem [6]. Fuzzy Forests are ideal for feature selection in large and high-dimensional datasets, where the features are highly correlated.

candidate choice, dataset, fuzzy forest, (15 more...)

arXiv.org Machine Learning

2203.02818

Country:

North America > United States > California (0.04)
Asia > Middle East > Iran (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Government > Voting & Elections (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.78)

Add feedback

FREEtree: A Tree-based Approach for High Dimensional Longitudinal Data With Correlated Features

Xu, Yuancheng, Zafirov, Athanasse, Alvarez, R. Michael, Kojis, Dan, Tan, Min, Ramirez, Christina M.

arXiv.org Machine LearningJun-17-2020

This paper proposes FREEtree, a tree-based method for high dimensional longitudinal data with correlated features. Popular machine learning approaches, like Random Forests, commonly used for variable selection do not perform well when there are correlated features and do not account for data observed over time. FREEtree deals with longitudinal data by using a piecewise random effects model. It also exploits the network structure of the features by first clustering them using weighted correlation network analysis, namely WGCNA. It then conducts a screening step within each cluster of features and a selection step among the surviving features, that provides a relatively unbiased way to select features. By using dominant principle components as regression variables at each leaf and the original features as splitting variables at splitting nodes, FREEtree maintains its interpretability and improves its computational efficiency. The simulation results show that FREEtree outperforms other tree-based methods in terms of prediction accuracy, feature selection accuracy, as well as the ability to recover the underlying structure.

artificial intelligence, freetree, machine learning, (17 more...)

arXiv.org Machine Learning

2006.09693

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.84)

Industry:

Health & Medicine (1.00)
Banking & Finance > Economy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Add feedback